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ABSTRACT 


Objective: The study aimed to investigate the reliability of ChatGPT’s answers to medical ques- 
tions, including those sourced from patients and guide recommendations. The focus was on 
evaluating ChatGPT’s accuracy in responding to various types of infectious disease questions. 
Materials and Methods: The study was conducted using 200 questions sourced from social 
media, experts, and guidelines related to various infectious diseases like urinary tract in- 
fection, pneumonia, HIV, various types of hepatitis, COVID-19, skin infections, and tuber- 
culosis. The questions were arranged for clarity and consistency by excluding repetitive 
or unclear ones. The answers were based on guidelines from reputable sources like the 
Infectious Diseases Society of America (IDSA), Centers for Disease Control and Prevention 
(CDC), European Association for the Study of Liver Disease (EASL) and Joint United Nations 
Programme on HIV/AIDS (UNAIDS) AIDSinfo. According to the scoring system, completely 
correct answers were given 1-point, and completely incorrect ones were given 4-points. To 
assess reproducibility, each question was posed twice on separate computers. Repeatability 
was determined by the consistency of the answers’ scores. 
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Results: In the study, ChatGPT was posed with 200 questions: 107 from social media plat- E-mail: 
forms and 93 from guidelines. The questions covered a range of topics: urinary tract infec- gorkemguclurd@gmail.com 
tions (n=18 questions), pneumonia (n=22), HIV (n=39), hepatitis B and C (n=53), COVID-19 
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n=11), skin and soft tissue infections (n=38), and tuberculosis (n=19). The lowest accuracy ier aera, 
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was 72% for urinary tract infections. ChatGPT answered 92% of social media platform ques- Published: February 16, 2024 
tions correctly (scored 1-point) versus 69% of guideline questions (p=0.001; OR=5.48, 95% 
Cla? 29-113}, 41')), Suggested citation: 
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fessionals and patients. Although ChatGPT answers questions from social media platforms in infectious diseases and clinical 
quite properly, we recommend that healthcare professionals be conscientious when using it. microbiology? Infect Dis Clin 
Microbiol. 2024;1:55-9. 
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ChatGPT as an Infectious Diseases Consultant 


INTRODUCTION 


rtificial intelligence models have influenced 

many branches of science in recent years. It 

is used in various departments of medicine. 
ChatGPT (Chat Generative Pre-trained Transformer) 
is a text-based artificial intelligence model devel- 
oped by OpenAl (1). ChatGPT can be used in many 
areas of medicine, such as to generate medical text, 
answer medical questions, provide recommenda- 
tions for diagnosis and treatment, translate medical 
documents, and have a medical conversation (2). 


The use of artificial intelligence in the medical field 
is increasing day by day. Patients often turn to the in- 
ternet and social media platforms for quick answers 
to their medical concerns. Nevertheless, evaluating 
the quality of the information these platforms pro- 
vide is very important. Unlike social media platforms, 
ChatGPT is a system that blends information by ac- 
cessing it from many reliable sources. Despite having 
limited access to medical data, ChatGPT performs at 
the level of a third-year medical student in licensing 
exams, encouraging discussions on emergency medi- 
cine within medicine (3). For example, pediatric urol- 
ogy questions were answered very well in a study con- 
ducted using text-based artificial intelligence model- 
ng (5). Although ChatGPT is thought to be promising 
in producing consistent responses, it is important to 
determine the accuracy of the medical information it 
provides. Artificial intelligence can cause many mis- 
directions that cause an information epidemic called 
“mnfodemic,” which can also threaten public health (4). 
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There are few studies on the use of artificial intel- 
ligence in the field of medicine. To our knowledge, 
our study is the first on this subject in the field of 
infectious diseases in Turkiye. We aimed to investi- 
gate the reliability and accuracy of ChatGPT’s an- 
swers to questions about infectious diseases. 


MATERIALS AND METHODS 


A total of 200 questions were collected from social 
media platforms (YouTube, X [formerly named Twit- 
ter], Facebook), questions directed to infectious dis- 
ease societies and specialists, or infectious disease 
guidelines. A social media question was defined as 
a question derived from social media. A guideline 
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question was defined as a question derived from var- 
ious infectious diseases guidelines. Of those, 93 were 
obtained from social media platforms and 107 from 
guidelines. The questions and the responses given by 
the experts are shown in the Supplementary Table. 
Questions about urinary tract infection, pneumonia, 
HIV, hepatitis B, hepatitis C, COVID-19, skin and soft 
tissue infections, and tuberculosis were included in 
the study. Our study did not include patient data, so 
ethics committee approval was not received. 


Social media platform questions were selected 
from the questions posed to infectious disease as- 
sociations and experts via social media platforms 
between 1 and 30 September 2023. Questions that 
were repetitive, with grammatical errors and un- 
clear answers were not included. The responses 
to the guideline questions were obtained from the 
Infectious Diseases Society of America (IDSA), Cen- 
ters for Disease Control and Prevention (CDC), Eu- 
ropean Association for the Study of Liver Disease 
(EASL) and Joint United Nations Programme on 
HIV/AIDS (UNAIDS) AIDSinfo guidelines, Turkish 
Thoracic Society Community-Acquired Pneumonia 
and Tuberculosis guidelines and Skin-Soft Tissue In- 
fections Consensus Report. Responses to questions 
prepared from the guidelines were mostly ‘high 
level of evidence’ and ‘strong recommendation’. In 
addition, questions covering the main topics were 
asked using the questions prepared from interna- 
tional and national guidelines. The questions in- 
cluded in the study were prepared in Turkish. They 
were asked ChatGPT in Turkish, and answers were 
obtained in Turkish from ChatGPT. The questions 
were translated into English to be included in the 
Supplementary Table in this manuscript. 


HIGHLIGHTS 


e While 92% of the questions from social media 
platforms received a 1-point response, 69% from 
guidelines received a 1-point response. 

e¢ ChatGPT achieved the highest correct answer 
rate in tuberculosis questions. 

e When 1 and 2-points answers are evaluated to- 
gether; ChatGPT answered social media platform 
questions more accurately than guide questions. 
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Table 1. Scoring the responses to guideline and social media platform questions. 
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Pat ; : Score 

Guideline Social media 
Parameters 0 5 

questions platforms questions é ; F : 

1-point 2-points 3-points 4-points Total 
Total (n, %) 107 (53.5) 93 (46.5) 160 (80) 28 (14) 8 (4) 4 (2) 200 
Dane ty wectinieciens 10 (55.5 8 (44.5 1372.3 4 (22.2 1 (5.5 00 18 
(n, %) (55.5) (44.5) (72.3) 2) (5.5) (0) 
Pneumoniae (n, %) 11 (50) 11 (50) 17 (77.3) 2 (9) 3.137) 0 (0) 22 
HIV (n, %) 17 (43.6) 22 (56.4) 29 (74.4) 8 (20.5) 2 (5.1) 0 (0) 39 
Hepatitis B (n, %) 25 (69.4) 11 (30.6) 30 (83.3) 5 (13.9) 0 (0) 1 (2.8) 36 
Hepatitis C (n, %) 12 (70.6) 5 (29.4) 13 (76.5) 3 (17.6) 0 (0) 1 (5.9) iz 
COVID-19 (n, %) 1 (9) 10 (91) 9 (82) 2 (18) 0 (0) 0 (0) 11 
pala eee 31 (81.6) 7 (18.4) 32 (84.2) 3 (7.9) 2 (5.3) 1 (2.6) 38 
Tuberculosis (n, %) 0 (0) 19 (100) 17 (90) 1 (5) 0 (0) 1 (5) 19 
Guideline 107 (100) 0 (0) 74 (69.2) 23 (21.5) 8 (7.5) 2 (1.8) 107 
Social Media 0 (0) 93 (100) 86 (92.5) 5 (5.4) 0 (0) 2 (2.1) 93 
RESULTS 
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The completely correct answers to the questions were 
evaluated as 1-point, correct but insufficient answers 
2-points, mixed or misleading answers 3-points, and 
completely wrong answers 4-points. To evaluate the 
reproducibility of the answers, each question was 
asked twice on different computers. Repeatability was 
defined as the consistency of two answers with sim- 
ilar rating categorizations. If the answer given to the 
repeated question was in a different score category 
or contained information at a different level of detail, 
it was considered negative in terms of repeatability. 
The answers were evaluated by two separate infec- 
tious disease specialists on two separate computers 
with the ChatGPT-4-September Update version. The 
final decision was made after the different answers 
were evaluated by a third expert. Those who got 1 and 
2-points from the social media platforms and guide- 
line questions were compared with the chi-square 
test to assess whether or not there was a significant 
difference between the answers. 


We conducted statistical analyses using MS Excel 
16.0 (Microsoft Corp., USA). The scores assigned to the 
answers were presented as percentages. Categorical 
data were presented as numbers and percentages. 
The chi-square test was used to compare categorical 
data. The statistical significance was set as p<0.05. 


In our study, a total of 200 questions were asked to 
ChatGPT, of which 107 were from social media plat- 
forms, and 93 were from guidelines. Eighteen ques- 
tions were asked about urinary tract infection, 22 
about pneumonia, 39 about HIV, 53 about hepatitis 
B and G, 11 about COVID-19, 38 about skin and soft 
tissue infection, and 19 about tuberculosis (Table 1). 


The highest correct answer rate of 90% was 
achieved in questions on tuberculosis, as 17 of 19 
answers got 1-point. Regarding urinary tract infec- 
tions, ChatGPT provided the least accurate response 
(72%). The mean+standard deviation (SD) score for 
ChatGPT-generated responses was 1.11+0.48 for the 
point of questions from social media platforms. 
Guidelines questions point mean+SD score for 
ChatGPT-generated responses was 1.42+0.71. While 
92% of the questions from social media platforms 
received a 1-point response, 69% from guidelines 
received a 1-point response. When 1-point answers 
between both question groups were compared, the 
difference was statistically significant (p=0.001; 
OR=5.48, 95% Cl=2.29-13.11) (Table 1). 


A total of 86 (92.5%) of the answers given to 93 so- 
cial media platform questions were evaluated as 
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Table 2. Comparison o 
platform questions. 


the rates of correct responses for guideline and social media 


OR: Odd'’s ratio, Cl: Confidence interval. 


1-point, and 5 (5.4%) were evaluated as 2-points. 
Of the 107 guideline questions, 74 (69.2%) received 
1-point, and 23 (21.5%) received 2-points. When the 
answers with 1 and 2-points were evaluated togeth- 
er, the difference between the social media plat- 
form questions and the guideline questions was 
significant (97.8% vs 90.7%; p=0.049; OR=4.69, 95% 
CI=1.00-21.98) (Table 2). 


DISCUSSION 


ChatGPT has a wide information network and gener- 
ally answers medical questions accurately. Our study 
investigated the reliability of ChatGpT’s answers to 
questions, including patients’ questions and guide 
recommendations. ChatGPT provides correct answers 
to questions about different types of infections at dif- 
ferent rates. The high accuracy rate, especially for 
tuberculosis, shows that ChatGPT provides more ac- 
curate information about certain types of infections; 
however, it gave poorer answers to questions about 
urinary infections. Although there were differences 
in accuracy rates between subjects, in our study, the 
rates of questions receiving 1 and 2-points in both 
question groups were over 90%. This showed that 
ChatGPT’s rate of correct answers was high, although 
there were deficiencies in some answers. 


A similar rate was obtained in the study of Caglar 
et al., including 137 questions about pediatric urol- 
ogy; 92% of the questions were answered correctly 
by ChatGPT (5). The same study stated that 5.1% 
of the responses to all questions were correct but 
insufficient, and 2.9% contained correct informa- 
tion along with misleading information (5). In an- 
other study conducted in South Korea, in which 79 
medical school exam questions were evaluated us- 
ing the ChatGPT January 1, 2023 version, it was ob- 
served that the performance of ChatGPT was lower 


Tuncer G, Gigli KG. 


Parameters Total Guideline (n=107) ae OR 95% Cl p 
1-point (n, %) 160 (100) 74. (46.3) 86 (53.7) 5.48 2.29-13.11 0.001 
1 and 2-points (n, %) 188 (100) 97 (90.7) 91 (97.8) 4.69 1.00-21.98 0.049 


than that of medical students (6). In another study 
conducted in 2023, it was observed that ChatGPT 
improved clinical workflow and radiology services 
in radiological decision-making (7). Dave et al. 
demonstrated the feasibility of using this potential 
of artificial intelligence in the field of medicine (8). 


Nevertheless, some studies in the literature show 
that ChatGPT cannot provide appropriate answers 
to medical questions. It has been shown that the 
accuracy level of ChatGPT may be affected by the 
quality of information sources, with a significant 
difference between social media platforms and 
guideline questions. In this regard, it can be said 
that text-based artificial intelligence should obtain 
information from more reliable sources. Although 
ChatGPT answered social media platform questions 
at higher rates than guideline questions, its perfor- 
mance may have decreased because the directory 
information was more complex and specific. 


After all, the rate of correct answers of ChatGPT 
to the guideline questions was relatively high, but 
healthcare professionals should be careful while 
using it. The accuracy of the information provided 
by ChatGPT must be checked with the guidelines. 
A recent study by Singhal et al. indicated that pro- 
grams specific to the medical field have been devel- 
oping (9). They reported that using a combination 
of prompting strategies, Flan-PaLM achieves state- 
of-the-art accuracy on every MultiMedQA multi- 
ple-choice dataset (MedQA3, MedMCQA4, PubMed- 
QA5 and Measuring Massive Multitask Language 
Understanding [MMLU] clinical topics6), including 
67.6% accuracy on MedQA (US Medical Licensing 
Exam-style questions), surpassing the prior state of 
the art by more than 17%. Nonetheless, they con- 
cluded the resulting model, Med-PaLM, performs 
promisingly but remains inferior to clinicians (9). 
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In another study, ChatGPT interpreted the clinical 
evaluation of 36 patients, and an overall accuracy 
of 71.7% was achieved (10). While ChatGPT showed 
high performance with 76.9% accuracy in making 
the final diagnosis, it showed the lowest perfor- 
mance with 60% accuracy in creating a differential 
diagnosis. ChatGPT showed poorer performance on 
differential diagnosis-type questions than answer- 
ing questions about general medical information 
(p=0.02) (10). It may be concluded that ChatGPT 
lacks the medical expertise and context required to 
fully understand the complex relationship between 
different conditions and treatments (11). Although 
it is a powerful text-based artificial intelligence 
model, it has some limitations, such as reasoning, 
establishing context, and limited text input. The in- 
ability to establish as comprehensive a context as 
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humans and the reliability of sources may not al- 
ways produce correct answers. 


In conclusion, healthcare professionals and pa- 
tients widely use artificial intelligence in the med- 
ical field. Although ChatGPT answers social me- 
dia questions well, we recommend that health- 
care professionals be conscientious when using 
ChatGPT. Given these considerations, the direct use 
of ChatGPT in the field of infectious diseases car- 
ries associated risks in its current state and neces- 
sitates active verification of information by users. 
Although there were certain limitations specific 
to infectious disease medicine, the results of this 
study indicated that ChatGPT’s medical knowledge 
has expanded and implies its potential to handle 
specific medical questions in the future. 
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